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Ii«?HOi)UCTIOi\r 


In  standard  applications  of  tests  for  goodness  of  fit,  the  ciii— 
square  test  (denoted  est  hereafter)  and  the  Kolmogorov-Sniinov  test 
(denoted  kst  hereafter)  are  tri-dely  used.    The  est  can  be  applied  in 
situations  ;rhero  the  population  has  oithor  &  continuous  or  disorotd 
distribution.    On  the  other  hand,  the  lest  can  be  correctly  used  only 
in  situation  vrhere  the  population  has  a  continuous  distribution. 

Since  the  kst  is  ba^ed  on  the  assumption  of  a  continuous  dis- 
tribution, it  is  necessary  to  study  whether  this  test  cay  be  applied 
in  a  situation  in  which  the  distribution  is  discrete. 

For  small  samples  from  a  hypothetical  binomial  population,  the 
est  statistic  is  compared  with  the  kst  statistic.    The  comparision 
between  the  two  tests  was  extended  to  lar;^'e  samples  from  h;>'pothetical 
multinomial  population. 

As  the  probability  distribution  function  is  completely  specified 
in  this  study,  estimation  of  parameters  was  not  considered. 


CliI-i(4UAHS  TiJoi'  FOR  GOODHiiJoS  OF  FIT 


1,1.    Chi-square  Test  Statistic  and  Its  Asymptotic  Distribution 

The  n  observations  (x-j^,  x^,  x^)  in  a  random  sample  from 

a  population  are  classified  into  k+1  mutually  exclusive  classes.  There 
is  some  thiDorotioal  probability  function  vrhioh  specifiea  the  prob- 
ability p^  that  an  observation  falls  into  the  ith  class.  Sometimes 
they  are  completely  specified  by  the  probability  fxmction,  sometimes 
they  are  less  completely  specified. 

If  the  theoretical  probability  function  is  correct,  observed 
numbers  follow  a  multinomial  distribution  with  p^  as  the  probability 
in  the  ith  class.    The  joint  distribution  of  the  observations  is 
therefore  specified  by  the  probability  fxmction; 


^  ^2' '  =    x-^i  x^l   ...  Xj^^-l!      ^1      -2     •••W  ^^-^^ 

where  x^.^^  ~  ^     ^1  "  -^2  *  "  ~  ^c*         ^lc+1  ~  ^^2* ^c* 

One  wants  to  test  the  null  hypothesis  that  the  observations  are 

a  random  sample  from  the  population  with  specified  probability  dis- 
tribution.   As  a  test  criterion  for  the  null  hypothesis,  Karl  Pearson 
(l)  proposed  the  follov;ing  test  statistic: 

^       k+1      (x.  -  nx).) 

1  =  1  -^1 

2 

A    is  a  quadratic  form  in  random  variables  (x.  -  n-o.),  i  =  1, 

2,  k+1,  with  the  coefficient  matrix  bein^  the  inverse  of  the 

covaxianco  matrix  of  multinomial  distribution.    Therefore  another  ex- 
2 

press ion  of  a    is  (2); 


i 


3 


k+1              {x.  -  np  )      (x    -  np  ) 
=  2     O-"^        \   (1.2a) 


vrhere 


whore 


S.  .  =1      if  i  =  j, 

0       if  i  7^  3. 

Hence  if  the  null  hypothesis  is  time,  the  limiting  distribution 
of  (1.2),  as  n-^oo,  is  the  chi-sci.uare  distribution,  with  k  degrees  of 
freedom,  vrhose  probability  density  function  is  (3); 

(-)  -  ^ 

f(u)=-^  a    2  ,  u>0  (1.3) 

(f) 

In  practice,  however,  X    ^iven  in  (1.2)  is  conputed  on  the  basis 

2 

of  random  sample;  and  for  lar^-e  n,  X    is  assximed  to  have  chi-sauare 
distribution,  hence  one  uses  its  table  (4)  to  obtain  the  probability j 

^00  o 

P(X^^c)  =\       f(u)du  (1.4) 
Jc 

where  f(u)  is  the  probability  density  fxinction  given  in  (l,3)» 

1.2.    Chi-scuare  Test  Statistic  for  the  Binomial  Distribution 

If  a  random  sample  of  size  n  is  drai-m  from  the  binomial  distribu- 
tion B(1;p),  then  the  sample  sum  x  has  the  binomial  distribution  B(n;p), 
Hence  one  obtains  a  test  statistic  from  (1.2)  for  this  sample  sum  z, 
and  it  can  be  written  as; 
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I-Ialcinc  use  of  the  Table  of  the  Sinoriial  Distribution,  one  obtains 

2 

the  cumulative  distribution  of  X  b,  namely  for  a  given  c  one  can  find 
k  suoh  that; 

pcx^b^c)  =  ?(x^k)  =  2  (;;)    (i-p)^"^  (1.6) 

2 

The  cumulative  distribution  of  X  b  is  tabulated  for  n  =  5>  10,  15, 
20,  25,  30,  aiid  p=l/2,  in  TiLBLjjJ  I  on  page  12. 


Figui'e  1.  Co:.iparision  of       and  chi-square(X  )  distributions 
-.v'ith  one  degroo  of  freodoa 


n=20 

n=30 

P(X%o) 

chi-square 

c 

c 

c 

P(X^^c) 

1.800 

.2632 

2.133 

.2004 

2.706 

.10 

3.200 

.1154 

3.333 

.0988 

3.341 

.05 

5.000 

.0414 

4.800 

.0428 

5.412 

.02 

7.200 

.0118 

6.500 

.0162 

6.635 

.01 

9.800 

.0026 

8.533 

.0052 

10.827 

.001 

10.800 

.0014 

If  the  null  hypothesis  is  true,  as  stated  in  section  1.1.,  X  b 

has  as  its  limiting  distribution  the  chi-squara  distribution  vrith  1 

degree  of  freedom. 

Prom  TiuiLi)  I  and  Figure  1,  as  n  increases,  one  notes  that  the 

2 

exact  distribution  of  X  b  is  a  fairly  good  approximation  to  the  chi- 

square  distribution  with  1  degree  of  freedom.    In  other  uords,  the 

2 

exact  probabilities  associated  vrith  X  b  statistics,  for  large  n,  are 
good  approximation  of  the  probabilities  associated  vrith  the  random 
variable  vrhose  probability  density  function  is  given  in  (1.3)  vrith 
1  degree  of  freedom. 
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1.3.     Ciii-squarG  Test  Statistic  for  the  Multinomial  Distribution 

If  a  random  sample  of  size  n  is  drairn  from  tiie  multinomial  dis- 
tribution il(l;p^,  p^,  Pj,),  then  the  sample  sum  (x^,  , 
x^)  has  the  multinomial  distribution  M(njp^,  p^,  P^^)?  irhose 
probability  function  is  given  in  (1,1).    The  probability  p^  that  an 
observation  falls  into  ith  class  is  defined  as  follo-v'-sj 

Pj,  =  (i^i)  ,       k  =  1,  2,  10.  (1.7) 

Hence  the  test  statistic  (1.2)  for  this  sample  can  be  i-rritten  as; 


-     .  2ip . 

1=1  -^1 


irhere  Pj-l"     "  ^1  "  ^2 "  ^lo' 

The  cxunulative  distribution  of  X^m  may  be  obtained  by  the  use  of 

the  probability  function  of  (x^,  x^,  ^iq)»  v;ill  be  very 

cumbersome  since  the  nun.ber  of  classes  is  so  lar^e.  Hence  the  Jlonte 
Carlo  technique  (5)  "vJ'as  applied  to  ^jet  the  approximate  distribution  of 
X^m. 

Ti-;o  examxjles  were  considered;  one  for  sample  size  1024  snd  other 
for  sample  size  512.    The  computer  (iBti  1620)  was  used  to  generate  the 

hypothetical  multinomial  distribution  Jl(l;p^,  p^,   ,  Pj_q)  '^•^'ith  p^'s 

being  specified  in  (1.6)"^,  hence  the  sample  sum  (x^,  x^,   ,  ^-],q) 

has  I-l(n;p^,  p^,   ,  P^q)         n=1024  and  n=512.  This  sampling  and 

computation  of  X^m  were  repeated  a  hundred  times. 

If  the  probabilities  defined  in  (l.?)  are  true,  the  expected  number 
of  observations  in  each  class  will  be  as  follows; 


"'"  See  Appendix 
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n  =  1024:        1      10      45      120      210      252      210      120      45      10  1 

n  =  512:        .5       5    22.5      60      105      126      105       60    22.5      5  .5 

Since  the  expected  numbers  in  the  classes  of  eictreme  ends  are  too 
small  for  good  approximation  they  were  grouped  vrith  the  adjacent  ones 

(3). 

If  th©  null  hypothesis  is  true,  that  is  if  the  sample  sum  (x-j^,  x^, 
...  ,  2:^q)  lias  multinomial  distribution  vrith  p^'s  being  given  by  (1.7), 
the  limiting  distribution  of  (1.8),  is  the  chi-sq.uare  distribution 
with  8  degrees  of  freedom, 

2 

The  sample  cumulative  distribution  of  X  m  was  tabulated  in 

H  on  page  15,  for  both  n's.    Prom  this  table  it  is  clear  that  the 

2 

distribution  of  X  m  is  close  to  the  chi-sq.uare  distribution  with  8 
degrees  of  freedom.    Another  interpretation  of  this  result  is  to  say 
that  the  hypothetical  distribution  so  generated  is  the  specified  multi- 
nomial distribution. 


K0Ll..0G0ii0V-Si-.lIHiT0V  TjIoT  F02  GOOMiioS  OF  FIT 

2.1.    Kolmogorov-iimirnov  Tost  Statistic  and  Its  iisymptotic  Distribution 

Let  (x^,  x^,  x^)  bo  the  n  observations  in  a  ranaora  sample 

from  a  population  with  a  continuous  cumulative  distribution  f-anction 
F(x),  which  is  completely  specified.  Define  Sn  =  lj(x)/n,  where  N(x) 
denotes  the  number  of  x^^'s  vrhose  observed  values  do  not  exceed  x. 

Since  F(x)  is  assumed  to  be  continuous,  3n(x)  is  a  step  fxinction 
with  the  magnitude  of  jumps  at  each  x  being  l/n.    VJhen  n  is  large  it 
is  certain  that  Sn(x)  of  the  sample  will  bo  approximately  eoual  to 
the  F(x).    As  the  tost  criterion  for  null  hypothesis  that  the  sample 
is  dravm  from  the  population  vrith  cumulative  distribution  function 
F(x),  A.  Kolmogorov  (6)  proposed  the  test  statistic; 
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Dn  =  sup  I  3n(x)  -  F(x)  1  (2.1) 
-OO  <  X  <  oo 

where  sup  is  the  abbreviation  for  supreiaum. 

If  P(2:)  is  continuous,  this  tost  statistic  has  the  great  advan- 
tage that  its  distribution  is  independent  of  F{x),  For  this  reason, 
the  kst  is  a  distribution-freo  statistic. 

Let  y  =  F(:^)  and  yn  =  Sn(x).    Then,  because  P(x)  is  continuous, 
y  has  the  rectangular  distribution  H(1/2,  l);  and  the  cuoulative  sample 
distribution  Gn(y)  of  yn  is  a  step  function  with  n  jumps  of  magnitude 
l/n  at  each  y.    From  these  facts,  the  c\imulative  distribution  function 
of  the  test  statistics  Dn  can  be  vn?itten  as; 

Kn(k/in)  =  P(Dn4k/n)  =  p(^suplGn(y)  -  G(y)  |^  k/^  ^  (2.2) 

0<y<  1 

Let  I,,  I^,.....,  I    be  n  intervals  defined  on  (O,  ll  as 
1'     2  n  \  7  - 

=    (x-l)/n,  x/n]  where  x  =  1,  2,  n.    Let  (r^,  r^,  r^) 

be  a  random  variable  (degenerated  tiith  ^2.      ^2   "**       ~  ^) 

denoting  the  numbers  in  the  sample  y^,  y^j   ,  y^  falling  into 

I^,  Ig,  I^,  respectively.    The  r's,  of  course,  have  an  n-1  dimen- 

sional multinomial  distribution  whose  probability  function  is  given 

n'  1 

p(r^,  r^,  r^)  =    r^^l  r^i  r^l      ^T"^  ^2.3) 


Now  the  random  variable  (r^,  r^,  r^)  xmiquely  determines 

6n(y),  hence  the  value  of  P(Dn^k/'n)  is  determined  accordingly  by 
summing  (2.3)  over  all  points  in  the  sample  space  of  (r^,  r^,...,  r^) 
for  which  |Gn(y)  -  G(y) U  k/n  for  all  y. 

1/hen  n  is  large  the  distribution  function  given  in  (2.2)  tends  to 
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00  2 

K(2i)  =   ^       (-ije  (2.4) 
k=-oo 

uniformly  '.rith  respect  to  h  (10),    Some  of  distributions  of  (2.4)  has 
been  tabulated  by  Snirnov  (11).    The  distribution  of  this  statistic 
for  fiiiitd  n  ^ivan  in  (2.2)  has  boon  tabulatod  by  I'Iass@y  (?)»  s^d 
Birnbaum  and  Tingey  (8),  (9). 

Without  the  assumption  that  F(x)  is  continuous,  ho-jevex  iCn(k/n) 
in  (2.2)  has  its  limiting  distribution  E(h)  in  (2.4)  (6).    But  the 
limiting  distributions  are  no  longer  independent  of  F(x).    They  depend 
on  the  value  of  F(s)  at  the  discontinuity  points;  but  not  on  the  form 
of  the  function  between  the  points  of  discontinuity  (12). 

2.2.    Kolmogorov-Smirnov  Test  Statistic  for  the  Binomial  Distribution 

As  mention  in  section  2.1.,  the  distribution  of  Dn  is  based  on  the 
assumption  of  a  continuous  PCs),    It  was,  however,  of  interest  to  see 
how  good  the  kst  vfas  if  one  applied  it  to  a  discrete  distribution. 

Consider  the  sample  sum  x  of  a  random  sample  from  the  binomial 
distribution  B(ljp)  described  in  section  1.2. «    Prom  (2.1),  the  test 
statistic  for  this  sample  can  be  raitten  as; 

Db  =  sup  |x/n  -  p|  (2.5) 
where  p  - 

IlaJcing  use  of  the  Tablo  of  the  Binomial  Distribution  together  with 
(2.5),  the  cumulative  distribution  of  Db  for  n=5,  10,  15,  20,  25,  30, 
and  p  =  "a,  was  tabulated  in  TABLE  I  on  page  12  . 

However,  if  the  null  hypothesis  is  true,  and  if  P(x)  is  continuous, 
the  cumulative  distributions  of  Db  and  Dn  must  be  fairly  close  together. 
Due  to  the  fact  that  F(z)  is  discrete,  comparision  between  the  two 
distributions  shows  that  the  value  of  k/^n  for  Db  is  significantly  lovrer 
than  that  of  Dn. 
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Figure  2  shovrs  the  discrepaiicies  between  the  t;70  distributions 

at  lo.rer  probability  levels.     In  other  words,  if  one  uses  the  critical 

value  of  Dn  -jith  given  level  of  significance,  sayo(  ,  to  test  for 

goodness  of  fit  in  binomial  distribution,  the  actual level  is 
significantly  lower  than  the  original  choice. 

Figure  2.  Comparision  of  Db  and  Dn  distributions 

n  =  20 

k/^  P(Db>/k,<n)  k/H  PCDn^.k/li) 

.100  .5034  .231  .20 

.150  .2632  .246  .15 

.200  .1154  .264  .10 

.250  .0414  .294  .05 

.300  .0118  .356  .01 

n  =  30 

k/n  P(Db^k/fn)  k/n  P(Dn^ky^n) 

.067  .5846  .19  .20 

.100  .3616  .20  .15 

.133  .2004  .22  .10 

.167  .0988  .24  .05 

.200  .0428  .29  .01 

.233  .0162 
.267  .0052 


2.3.    IColiaogorov-Smirnov  Test  Statistic  for  the  ilultinonial  Distribu- 
tion 

Consider  the  multinomial  distribution  l'l(n;p^,  Pg,  ...  ,  P^q^ 
stated  in  section  1.3*.    If  the  probabilities,  p^'s,  are  correct  as 
specified  in  (I.7),  one  has  the  following  cumulative  distributions 
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for  n  =  1024  and  n  =  512  respsctively; 

1  11  56  176  386  638  842  968  1013  1023  1024 
.5    5.5    28       88      193      319      424      434      506.5    511.5  512 

Hence  from  (2.1),  the  test  statistic  for  this  sample  can  be  expressed 
as; 

Dm  =  sup  I  3n(x)  -  F(z)  (  (2.6) 

One  n:ay  he  able  to  obtain  the  probability  distribution  of  Dm  by 
direct  computation  using  (l.l);  but,  as  pointed  out  in  section  1.3. » 
direct  computation  becomes  cumbersome.    For  those  samples  obtained  in 
section  1.3.,  one  computed  Dm  given  in  (2.6),  hence  it  iras  possible  to 
form  a  sample  cumulative  distribution  of  the  test  statistics  Dm,  The 
sample  cumulative  distribution  of  Dm  is  tabulated  in  TA3L3  H  on  page 

15. 

Comparision  of  the  tvro  distribution,  Dm  and  Dn,  shows  that  the 
value  of  k/n  for  Dm  is  lo-.rer  than  that  for  Dn,    The  reasons  for  the 
loi-rer  k/^  for  Dm  may  be  either  that  F(x)  is  discrete  or  that  Dm  is  a 
random  variables  whose  asymptotic  distribution  is  defined  as  (2.4). 
or  both.    :2ither  stronger  justification  or  proper  modification  is 
needed  in  order  to  apply  the  kst  in  the  situation  of  discrete 
F(2:),  especially  with  small  n, 

C0:.i?*ua3I01'J  OF  the  CIiI-3QU.i?JiJ  MW  KOLi..OGOiiOV'- 
Sl-iliujOV  TiiST  STjiTISTICS 

For  the  application  of  the  est  for  goodness  of  fit,  appiropM'a/te 
grouping  is  needed.    Llann  and  iiald  (13)  have  given  a  techniq.ue  for 
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deciding  on  an  optimum  nunibor  of  class  intervals  for  the  est  applica- 
tions.   Grouping  observations  into  intervals  for  the  kst  tends  to 
lower  the  value  of  k/(?n  in  (2.2).    Sxamples  given  in  previoiis  sections 

indicate  that  the  ky^n  for  both  Db  and  Dm  are  lower  than  those  tabled, 

2.  2 

but  X  D  and  X  s  are  good  approximations  to  the  chi-square  distribution. 
Hence  for  the  discrete  distribution,  the  kst  is  conservative. 

The  kst  is  correctly  used  only  when  the  distribution  is  continuous 
and  completely  specified.    The  distribution  of  the  Dn  is,  therefore, 
not  knoim  when  certain  parameters  of  the  population  have  to  be 
estimated  from  the  sample,  but  one  may  safely  conclude  that  the  dis- 
crepancy between  the  sample  distribution  and  the  ioypothetical  distri- 
bution is  significant  if  the  value  of  Dn  exceeds  the  table  value  (15) • 
The  est  is,  however,  easily  modified  by  reducing  the  number  of  degrees 
of  freedom  and  can  be  applied  to  the  situation  vrher©  the  estimation  of 
parameters  is  needed. 

The  kst  will  usually  require  less  computation  than  the  est.  The 
kst  treats  individual  observations  and  thus  does  not  lose  information 
by  grouping,  as  the  est  necessarily  does,     ifith  small  samples  this 
loss  of  information  in  est  procedures  is  large,  so  use  of  the  est  is 
not  ad  vis;  able  (15)» 

The  lest,  at  least  ^Qfjo  power  level,  will  detect  the  smaller  devia- 
tions in  cumulative  distribution  than  will  the  est  (15).    In  general, 
the  power  of  the  est  is  not  known  (14)>  whereas  a  lower  bound  of 
power  of  the  lest  can  be  computed  for  any  alternative  (15)»    However  if 
the  kst  is  applied  to  a  discrete  population,  nothing  can  be  said 
about  its  povrer.    Also  the  fact  that  one  obtained  lower  k/^n  as  pointed 
out  above,  explains  the  reason  not  to  use  the  kst  in  the  discrete 
situation. 
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TABLE  I 

Tlie  entries  in  this  table  are  c  (1st  column)  k/n  (2nd  colmnn), 
and  one-half  of  the  probability'^  (3rd  column)  that  X^b  and  Db  are  less 
than  or  eaual  to  c  and  k/n  respectively  for  each  n. 

n  =  5 

c              .                      k/H  '  p 

.  2000                               . 1000  . 5000 

1.8000                             .3000  .1875 

5.0000                               .5000  .0313 


n  =  10 


..4000 
1.6000 
3.6000 
6.4000 
10.0000 


.0667 
.6000 
1.6667 
3.2667 
5.4000 
8.O667 
11.2667 


.1000 
.2000 
.3000 
.4000 
.5000 

n  =  15 
.0333 
.1000 

.1667 

.2333 
.3000 
.3667 
.4333 


.3770 
.1719 
.0547 
.0107 

.0010 


.50CO 
.3036 
.1509 
.0592 
.0176 

.0037 
.0005 


Table  of  the  Binomial  Distribution 

Department  of  Commerce,  National  Bureau  of  Standard 
Applied  Mathematical  Series  ITo.  _6. 

The  larger  values  i-iere  omitted  for  n  =  I5,  20,  25,  30. 


T.4BL^  I  (cont. ) 


n  =  20 


c 

P 

•  2000 

.0500 

.4119 

.4000 

.1000 

.2517 

1.8000 

.1500 

.1316 

3.2000 

.2000 

.0577 

5.0000 

.2500 

.0207 

7.2000 

.3000 

.0059 

9.8000 

.3500 

.0013 

12.8000 

.4000 

.0002 

n  =  25 

.0400 

.0200 

.5000 

.3600 

.0600 

.3450 

1.0000 

.1000 

.2122 

1.9600 

.1400 

.1148 

3.2400 

.1800 

.0539 

4.4800 

.2200 

.0216 

6.7600 

.2600 

.0073 

9.0000 

.3000 

.0020 

11.5600 

.3400 

.0005 

•TABLiil  (cont.) 


n  =  30 


c  k/^  p 

.1333  .0333  .4278 

.5333  .0667  .2923 

1.2000  .1000  .1808 

2.1333  .1333  .1002 

3.3333  .1667  .0494 

4.6000  .2000  .0214 

6.5000  .2333  .0081 

8.5333  .2667  .0026 

10.8000  .3000  .0007 

13.3333  .3333  .0002 
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TiiBLJ  II 

The  entries  in  this  table  are  c  ( 1st  colxinn) ,  k/n  ( 3rd  column) , 
and  P(X  m^c)  and  P(Diii ^  k/n) ,  (2nd  and  4th  column)  respectively  for 
n  =  512  and  n  =  1024. 

n  =  1024 


c 

P(X  c) 

k/n 

P(Dm 

1.646 

1.00 

.0068 

1.00 

2.032 

.99 

.0087 

.96 

2.733 

.96 

.0107 

.88 

3.490 

.92 

.0136 

.79 

4.594 

.78 

.0146 

.71 

5.527 

'*7 

.70 

■  .01:^6 

.07 

7.344 

.48 

.0166 

.56 

9.524 

.26 

.0175 

.50 

11.030 

.17 

.0195 

.41 

13.362 

.06 

.0214 

.32 

15.507 

.03 

.0234 

.21 

18.168 

.01 

.0263 

.10 

20.090 

.00 

.0322 

.05 

26.125 

.00 

.0425 

.01 

.0509 

.00 

c 

P(xS  c) 

k/^ 

P(Dn. 

11.030 

.20 

.0334 

.20 

13.362 

.10 

.0356 

.15 

15.507 

.05 

.0381 

.10 

18.168 

.02 

.0425 

.05 

20.090 

.01 

.0509 

.01 

T/iBLE  IlCcont.) 


n  =  512 


c 

1,646 

.99 

.0078 

1.00 

2.032 

.98 

.0117 

.96 

2.733 

.93 

.0136 

.93 

3.490 

.88 

.0175 

.79 

4.594 

.73 

.0195 

.68 

5.527 

.59 

.0214 

.58 

7.344 

.43 

.0253 

.44 

9.524 

.30 

.0312 

.33 

11.030 

.23 

.0332 

.22 

13.362 

.06 

.0390 

.15 

15.507 

.04 

.0410 

.11 

18.168 

.01 

.0429 

.09 

20.090 

.01 

.0449 

.03 

26.125 

.00 

.0546 

.01 

.0601 

.00 

.0720 

.00 

c  PCX-^  c)            'k/n  P(Dn^k/H) 

11.030                      .20  .0473  .20 

13.362                .10  .0504  .15 

15.507                 .05  .0539  .10 

18.168                      .02  .0601  .05 

20.090                      .01  .0720  .01 
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APPEI^IX 

Generation  of  Hypothetical  Distribution  "by  Computer 

One  selects  two  2k-(iigit  morabers  say  m  and  n,  for  k  sufficiently 
large  (say  5       larger),  multiplies  m  "by  n,  and  extracts  the  middle 
21c  digits,  which  replace  n.    The  2k  digits  extracted  from  the  middle 
of  the  product  of  m  by  n,  has  the  rectangular  distribution  R(1/2,  l) 
(17). 

Repetition  of  the  above  process  tjill  give  as  many  random  numbers 
as  one  wants.    Let's  call  this  random  niimber  r,  then  generate  10  r's 
and  compare  them  vfith  l/2.    Let  x^  be  the  nuiaber  of  r's  vhich  exceeds 
1/2,  such  that  x^  =  1  if  X  =  i  -1  and  x^  =  0  otherwise,  then  (x^,  x^, 

^20^         multinomial  distribution  k(l;p^,  p^,...,  P^q^  where  p^'s 
are  given  in  (1.7)» 
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The  chi-sq.uare  test  said,  the  Kolmogorov-Smirnov  test  are  vridely 
used  for  testiiitS  goodness  of  fit.    The  former  can  be  applied  in 
situations  -.mere  the  population  has  either  a  continuous  or  discrete 
distribution,  and  the  lattor  oan  bo  correctly  usod  only  in  situar- 
tions  -.rhere  the  population  has  a  continuous  distribution. 

Since  the  IColmogorov-Smirnov  test  is  based  on  the  assumption 
of  a  continuous  distribution,  it  vras  of  interest  to  see  -whether 
this  test  may  be  applied  in  a  situation  ifhere  the  distribution  is 
discrete.    Tvro  complotely  specified  discrete  distributions  i-zere 
considered. 

The  exact  probability  distributions  of  the  chi-sq.uare  test 
statistic  and  the  Kolmogorov-Smirnov  test  statistic  '.■rere  tabulated 
and  compared  for  small  samples  (n  ^  30)  from  a  completely  specified 
binomial  population. 

The  comparision  of  the  tiro  test  statistics  was  extended  to 
lar^^e  random  samples  (n  =  512,  IO24)  from  a  completely  specified 
multinomial  population.    The  approximate  distributions  of  the  chi- 
square  test  statistic  and  the  Kolmogorov-Smirnov  test  statistic 
were  obtained  by  the  Llonte  Carlo  technique. 


